NetNews Offline 2

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Offline 2 / NetNews Offline Volume 2.iso / news / comp / lang / c-part2 / 12439 < prev next >

Wrap

Internet Message Format | 1996-08-05 | 1.4 KB

Path: mail2news.demon.co.uk!genesis.demon.co.uk From: Lawrence Kirby <fred@genesis.demon.co.uk> Newsgroups: comp.lang.c Subject: Re: Need code to remove non-adjacent duplicate lines Date: Sat, 30 Mar 96 20:49:01 GMT Organization: none Message-ID: <828218941snz@genesis.demon.co.uk> References: <1996Mar27.113154.14694@schbbs.mot.com> Reply-To: fred@genesis.demon.co.uk X-NNTP-Posting-Host: genesis.demon.co.uk X-Newsreader: Demon Internet Simple News v1.27 X-Mail2News-Path: genesis.demon.co.uk In article <1996Mar27.113154.14694@schbbs.mot.com> ghelm "george_helm" writes: >Does anyone have C code that will remove *non-adjacent* duplicate >lines from an ascii file ? I need to retain the original file >format so I can't use simple stuff like sort -u or uniq in UNIX. >Help on this is greatly appreciated. How you approach this depends on whether the file is small enough to be stored in memory. If it is then you build a lookup datastructure keyed on the contents of the line. If a new line matches an entry in the datastructure you ignore it, otherwise add it and output it. The following demonstrates the algorithm: awk ' { if (!a[$0]) { a[$0] = 1 print } }' I leave it as an exercise for the reader to translate this to C! :-) -- ----------------------------------------- Lawrence Kirby | fred@genesis.demon.co.uk Wilts, England | 70734.126@compuserve.com -----------------------------------------